Finding Topic-specific Strings in Text Categorization and Opinion Mining Contexts

نویسندگان

  • Rémi Lavalley
  • Chloé Clavel
  • Marc El-Bèze
  • Patrice Bellot
چکیده

In this paper, we present a new probabilistic method for automatically extracting topic-specific strings in a text categorization context. The advantage of this method is twofold. First, it allows us to automatically point out the expressions characterizing a specific topic category for a potential knowledge modelling. Second, it contributes to improve categorization results by providing to the classifier text spans which are more relevant than isolated words. The novelty of our approach relies thus not only on the method used for topic-specific strings extraction but also on the adaptation of the traditional cosine similarity measure for text categorization. We choose for the evaluation to tackle two different challenging corpora: movie reviews of Internet users, and manual transcriptions of call center conversations. On these two tasks, we observed a gain in the categorization results (between 1 and 8%).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Opinion Mining and Topic Categorization with Novel Term Weighting

In this paper we investigate the efficiency of the novel term weighting algorithm for opinion mining and topic categorization of articles from newspapers and Internet. We compare the novel term weighting technique with existing approaches such as TF-IDF and ConfWeight. The performance on the data from the text-mining campaigns DEFT’07 and DEFT’08 shows that the proposed method can compete with ...

متن کامل

Opinion Mining in Hungarian based on textual and graphical clues

Opinion Mining aims at recognizing and categorizing or extracting opinions found in unstructured text resources and is one of the most dynamically evolving subdiscipline of Computational Linguistics showing some resemblance to document classification and information extraction tasks. In this paper we propose a novel approach in Opinion Mining which combines Machine Learning models based on trad...

متن کامل

Data Mining and the Text Categorization Framework

The aim of this contribution is to show one of the most important application of text mining. According to a wide part of the literature regarding the aforementioned field, great relevance is given to the classification task (Drucker et al., 1999, Nigam et al., 2000). The application contexts are several and multitask, from text filtering (Belkin & Croft, 1992) to word sense disambiguation (Gal...

متن کامل

Categorizing Unknown Text Patterns for Information Extraction Using a Search Result Mining Approach

An advanced information extraction system requires an effective text categorization technique to categorize extracted facts (text patterns) into a hierarchy of domain-specific topic categories. Text patterns are often short and their categorization is quite different from conventional document categorization. This paper proposes a Web mining approach that exploits Web resources to categorize un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010